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Abstract 

Sample processing in Pure Data generally is block- 
based, while control or message data are computed 
one by one. Block computation in Pel can be 
suspended or blocked to save CPU cycles. Such 
“blocked signals” can be used as an optimization 
technique for computation of control data. This pa¬ 
per explores possible applications for this “Blocked 
Signal Processing” (BSP) technique and presents a 
system for physical modelling and for feature extrac¬ 
tion as examples. 
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1 Introduction 

Software for computer music and realtime syn¬ 
thesis has to deal with two competing require¬ 
ments: It continuously has to compute audio 
samples at a rate specified by the underlying 
samplerate (like 44.1 kHz as “CD quality”) and 
it has to deal with sporadic events, for instance 
note events coming from midi streams. In many 
computer music system, the events of such con¬ 
trol streams are computed less often than the 
actual audio samples: In addition to the sample 
rate as a defining parameter of a music software 
a slower control rate was added: “The control 
rate is the speed at which significant changes 
in the sound synthesis process occur. For ex¬ 
ample, the control rate in the simplest program 
would be the rate at which the notes are played. 
[... ] The idea of a control rate is possible be¬ 
cause many parameters of a sound are ’slowly 
varying’.” 1 

The separation of control and sample rate has 
been a part of computer music software since its 
early days: “Among the languages of the Mu¬ 
sic N family, Music IV and its derivatives (in¬ 
cluding Music 4C) are sample-oriented, whereas 

1 [Dodge and Jerse, 1985, p. 70] 


Music V and Crnusic are block-oriented. The 
Csound language is also block-oriented, since it 
updates synthesis parameters at a control rate 
set by users.” 2 

The introduction of different rates for audio 
and control streams makes specific optimiza¬ 
tions for each domain possible. For the spo¬ 
radic events of control streams, that only hap¬ 
pen rarely compared to the calculation of audio 
samples, redundant or unnecessary calculations 
can be omitted. 

Audio data however can be computed in 
blocks of samples. Instead of computing every 
sample and then the next, several samples are 
computed in one go, which significantly reduces 
the overhead in systems based on “unit genera¬ 
tors” : “This is done to increase the efficiency of 
individual audio operations (such as Csound’s 
unit generators and Max/MSP and Pd’s tilde 
objects). Each unit generator or tilde object 
incurs overhead each time it is called, equal to 
perhaps twenty times the cost of computing one 
sample on average. If the block size is one, this 
means an overhead of 2,000%; if it is sixty-four 
(as in Pd by default), the overhead is only some 
30%.” 3 

Apart from avoiding the overhead of function- 
calls for each unit generator, blocked process¬ 
ing also can be implemented in a way that re¬ 
duces the number of memory allocations neces¬ 
sary. For this, the block size has to be constant 
over a sufficient time. 

But block computation also has a major dis¬ 
advantage: It adds latency of at least one block 
duration. “Sample-oriented compilers are more 
flexible, since every aspect of the computation 
can change for any sample”. 

Contrary to audio data control streams usu¬ 
ally are not computed in blocks. As most con- 

2 [Roads, 1996, p. 801-802] 

3 [Puckette, 2007, p. 63] 

4 [Roads, 1996, p. 801] 



trol events happen only sporadically there may 
not be enough data to fill a block that would 
be useful to compute. For instance midi notes 
may be produced only every quarter beat which 
at a tempo of 120 BPM would amount to only 
one note event every 500 milliseconds. Filling a 
block of 60 quarter notes would then take half 
a minute - no sane musician would accept a la¬ 
tency of that duration. 

While this is an extreme and admittedly a 
bit silly example, the number of control events 
often is small enough to not let the overhead of 
calling the unit generators for every event affect 
the overall performance of the system. 

But this is only valid, as long as the number 
of events to compute stays small. Especially in 
algorithmic composition, in simulations or sim¬ 
ilar cases the amount of data in a “score” can 
become very big. The overhead of control infor¬ 
mation now becomes a problem. Computation 
in blocks may yield a significant performance 
gain. The results of the control computations 
may still only be needed infrequently compared 
to the rate of audio signals. Some modern lan¬ 
guages like LuaAV or ChucK are designed to 
deal with this problem right from the start. In 
ChucK, “the timing mechanism allows for the 
control rate to be fully throttled by the pro¬ 
grammer - audio rates, control rates, and high- 
level musical timing are unified under the same 
timing mechanism.” 5 

Pure Data was not designed with variable 
control rates in mind, but a peculiar feature of 
Pd can be exploited to do block computations 
on control data, confessedly in a limited way. 
Pd can suspend its own audio computations lo¬ 
cally using the [switch^] object. This object 
can stop all sample computations inside of a Pd 
subpatch or canvas and restart it on demand. A 
common use is to activate parts of a Pd patch 
only when needed, for example to manage voices 
in a polyphonic synthesizer: inactive voices can 
be switched off when not in use to save CPU 
cycles. 

A not so common use case for switched-off 
subpatches is introduced and explored in this 
paper. We use subpatches to perform block- 
computations at rates much lower than the au¬ 
dio samplerate. These computations will em¬ 
ploy the unit generators originally intended to 
do audio signal computations. The computa¬ 
tion rate is adjusted by suspending blocks lo¬ 
cally. We will call this approach “Blocked Sig- 

5 [Wang and Cook, 2004] 


nal Processing” or BSP to have a catchy and 
short buzzword available. 

2 Blocked Signal Processing 

A very simple example will now show the basic 
principle of BSP in Pd, how it can optimize cer¬ 
tain actions and how it compares to traditional 
message computations. The task solved by the 
following two Pd code examples is simple: Read 
4096 numbers stored in a table “ORIG”, add 0.5 
to it and store the result in table “RESULT”. 



Figure 1: Transposing a table by 0.5 using mes¬ 
sage computations 

Implemented with message objects, a counter 
is started by an incoming bang-message, its in¬ 
dex number is used to read out the table data, a 
control-rate addition object adds 0.5 to it, then 
the value is stored. 

The BSP implementation uses a pair of 
[tabreceive~] and [tabsend~] objects to con¬ 
stantly read and write the tables as a block of 
samples. The signal-rate addition object adds 
0.5 in between. The [switch~ 4096 1 1] object 
resizes the blocksize to 4096 for this sub-canvas, 
so that the table-accessing objects can process 
that many samples. Additionally it switches off 
the DSP computation inside the subpatch at the 
beginning. So unless some messages to [switch~] 
switch on the computation, this part of the 








Figure 2: Transposing a table by 0.5 using BSP 
computations 

patch doesn’t consume any CPU resources. By 
sending a “bang”-message to [switch~], the sub¬ 
patch is switched on for exactly one block of 
samples, then it’s switched off again. During 
the time it is on, the actual computation hap¬ 
pens, so the end result is the same as for the 
message version. 

One small difference is important to note 
here: Probably for performance reasons Pd de¬ 
fers updates of the graphics in arrays or tables 
that are used with active [tabsend~] objects. 
Changes to the included data are only visible 
when the graph is closed and opened again. 

The two implementations of the “transposer” 
don’t show much difference in CPU usage. This 
is expected, as the simple calculations made 
here do not involve many objects so the over¬ 
head is negligible. Also both the BSP and the 
traditional method avoid any memory alloca¬ 
tion by directly working on pre-allocated tables. 
When dealing with lists of numbers of arbitrary 
size, a common idiom in Pd is to build these 
lists by pre- or appending elements to existing 
lists stored in [list] objects. 6 For longer lists 
this can become a major cause of slowdown in 
Pd patches. 

3 [physigs] - Physical Modelling 
implemented with BSP 

The BSP technique should show more of its po¬ 
tential when applied to algorithms involving a 
bigger number of unit generators and more par¬ 
allel tasks. We will now turn to such a use case 
and take a look at a physical modelling system 
based on spring-connected particles. 

6 See the list-help.pd file in Pd’s documentation for an 
example. This file also shows the opposite operation of 
serializing a list with [list split] which is slow as well. 


A particle simulation applies the Newtonian 
laws of mechanics to a simulation of point 
masses. The physically-inspired rules are used 
to calculate velocities and accelerations of point 
particles, that are defined by vectors describ¬ 
ing their positions and impulses. Every particle 
also includes a force accumulator that holds any 
external forces applied to a particles. Forces, 
positions and impulses fully specify the current 
state of a particle system. Transitions from one 
state to a next are calculated in discrete steps: 
Usually a world clock is employed that advances 
one simulation step and initiates a new run of 
the physics calculations to find the new posi¬ 
tions of the particles. As the same set of phys¬ 
ical rules has to be applied to many particles, 
the problem is an ideal candidate for doing the 
calculations in blocks of particles. 

With PMPD' and MSD 8 , two implementa¬ 
tions for this already exist as extensions to 
Pd (so called externals). For this paper a 
BSP-implementation of a particle system called 
“[physigs]” was written in Pd and is tested 
against the MSD and PMPD implementations. 
It’s help file is shown in Fig. 7 at the end of this 
paper. 

[physigs] is a particle simulation in two di¬ 
mensions. In consists of a main Pd abstraction 
called [physigs] that can be called with a prefix- 
tag to make using the object several times in 
a single project possible. The state of the sys¬ 
tem is held in a number of Pd [table] objects for 
positions, velocities, masses, forces and a table 
that holds nreta-data, currently only the mobil¬ 
ity state of masses is watched: A mass can be 
mobile or fixed. A particle is identified by an 
integer number which is used as a lookup index 
into the state-holding tables. Particle 10 for ex¬ 
ample would hold its x-position in the 10th ele¬ 
ment of the table “pos-x”, its y-position would 
be the 10th element in table “pos-y” and so 
on for tables “force-x”, “force-y”, “mobile” or 
“mass”. 

The size of these tables controls the size of the 
system: Tables of size 64 can control up to 64 
particles. All 64 particles are computed regard¬ 
less of particles actually being used. The table 
size can be selected when creating the [physigs] 
object. 

In Figure 3 the calculations that advance the 
simulation one step are shown. 

At the top left, the current x-positions and 

7 [Henry, 2004] 

s [Montgermont, 2005] 



get state 



set new position 


Figure 3: Advancing the world simulation one 
time step 

x-velocities are read using [tabreceive~] objects. 
Respective [tabsend~] objects write the new po¬ 
sitions back, after the physics laws have been 
applied. A table “force-x” holds an accumu¬ 
lation of all forces that have been applied to 
particles. The “mobile” table is 0 when a par¬ 
ticle should be fixed, and 1 when it is mobile. 
Weights are stored in a “mass” table. Some 
friction is applied to fight instabilities and sim¬ 
ulate air drag. The actual algorithm is rather 
simple: Any force on a particle is converted to 
an acceleration by dividing the mass, the accel¬ 
eration is added to the velocity, which in turn is 
added to the position. The forces then get reset 
to zero, which is a step not shown in the fig¬ 
ure. The subpatches “checkbounds” and “clip” 
restrict the possible positions to a configurable 
range and possibly flip the velocity to simulate 
bouncing of the world’s ends. The same calcu¬ 
lations are made for the y-coordinates as well. 

The system is driven by an outside [metro] 
whose speed specifies the speed of the physical 
simulations. Metro periods in “haptic” ranges 
of 20 to 100 milliseconds are good choices. The 
[metro] then regularly generates bang-messages 
to switch on and off the DSP-signal computa¬ 
tions in the subpatch holding the simulation 
step. 

A similar construction in [physigs] calculates 
the forces of visco-elastic springs connecting two 
particles. These springs are defined by a num¬ 


ber of tables again, where each spring indexes 
tables with its integer-ID. Any spring links two 
particles. The ID numbers of these two parti¬ 
cles are held in two state tables called “link-ml” 
and “link-m2”. Other state tables hold param¬ 
eters like damping, stiffness, relaxed length and 
so on. 

Springs generate forces that have to be ac¬ 
cumulated into the tables holding the forces for 
each particle. This currently is realised in a sep¬ 
arate step that doesn’t use the BSP technique, 
because the “vanilla” version of Pd doesn’t of¬ 
fer an object that can write into a table at a 
position specified by an audio signal. Miller 
Puckette intends to include such an object in 
future versions of Pd, the C-code for the ob¬ 
ject is rather trivial and it already exists as an 
external. Unfortunately falling back to tradi¬ 
tional control-rate calculations in this part de¬ 
stroy many of the performance gains as we will 
see in the benchmark section. 

3.1 Performance comparison 

To compare the different implementations of a 
particle system, a test system has been devised 
that creates a configurable number of particles 
at random positions and with random mass and 
daisy-chains them with spring links. Such a per¬ 
formance test is part of the MSD distribution. 
Pd’s built-in “Load meter” has been used to get 
rough CPU usage values. The world advances 
one step every 50 milliseconds (or 2 * 25 msec in 
[physigs] for link and mass computations each). 
Table 1 shows the results of several test runs. 

It turned out, that [physigs] runs much slower 
than both MSD and PMPD when the control- 
rate calculations are used to distribute the 
spring-forces to each particle. Switching off this 
part of the patch in the BSP implementation 
will let [physigs] catch up to MSD and beat 
PMPD. 

MSD doesn’t have any overhead, as it com¬ 
putes the full simulation in just a single object. 
PMPD however has separate objects for each 
particle and link. In the example patch 4096 
objects for the simulation participants alone are 
used. So here the overhead is significant as ex¬ 
pected. 

[physigs], especially when used without the 
control-rate workaround for a missing “write 
to table”-object, turns out to be a capable 
contender for physical modelling in Pd-vanilla 
or Pd-almost-vanilla environments. Note that 
these benchmarks are only meant to give a per- 



Simulation Type CPU Usage 

"MSD r~5 

physigs 30 

physigs (w/o force distribution) 5 
PMPD 16 


Table 1: Simulation of 2048 particles and 2048 
springs at metro-period of 50 ms using three 
different implementations 


formance estimation and should not be taken 
as “hard numbers”. But the author has suc¬ 
cessfully run [physigs] on the “RjDj” version of 
Pure Data on the iPhone with its much slower 
CPU compared to standard computers. 

4 Feature extraction and analysis 
with BSP 

[physigs] generates data-heavy control streams 
in a completely artificial manner. Similar 
amounts of data points have to be handled 
when analysing external audio coming in over 
the ADC (soundcard) and looking for certain 
features to guide a musical composition. Pitch 
tracking or onset detection work on single sound 
objects and must be prepared to react quickly. 
The [sigmund~] or [bonk~] objects that are part 
of Pd, thus run at audio rate. In this case, ap¬ 
plying BSP would not be viable. But if a com¬ 
poser is interested in features on a ” slower” time 
scale, BSP can be applied. 

As an example lets consider the differences 
in time scale of physically moving between 
two rooms compared with playing two differ¬ 
ent notes on a clarinet. Changing rooms takes 
much more time than playing notes on instru¬ 
ments. To detect the clarinet’s pitch changes, 
the software has to be constantly “alarmed” 
of the spectral content registered through the 
soundcard. However when trying to detect if a 
person holding a microphone changes rooms by 
analysing the spectral or reverberant character¬ 
istics of the two environments, spectral snap¬ 
shots can be made and compared much less fre¬ 
quently than in the case of pitch detection. 

The author has developed a set of BSP fea¬ 
ture extraction (FE) objects mainly intended to 
be used in the RjDj version of Pd that runs on 
mobile devices. The timbrelD collection of Pd 
externals by William Brent 9 served as an inspi- 

9 [Brent, 2009]. Newer versions of timbrelD include 
both audio- and control-rate versions of its analysis ob¬ 
jects. 


ration for these. Detecting ’’changed rooms” is 
a common need of composers working for RjDj. 
The analysis rate of the FE objects is config¬ 
urable similar to the [physigs] object by adapt¬ 
ing the speed of a [metro] object that blocks and 
activates them. Just as in the introductory ex¬ 
ample of a “table transposer” the objects work 
on shared table data and extract statistical fea¬ 
tures like centroid, mean, energy or flatness. 

If the table to be analysed holds a magnitude 
spectrum, the extracted features describe the 
frequency content of the sound environment. Of 
course the objects may be used to acquire sta¬ 
tistical analysis of tables holding other kinds of 
data as well. 

As is well-known spectrum analysis with 
Fourier transformations is a relatively costly op¬ 
eration. In Pd it is already carried out in sub¬ 
patches, so the analysis is a natural candidate 
to be “blocked” with the BSP technique. Of 
course the time resolution gets worse in this 
case, and overlapping analysis is not useful any¬ 
more, when there are pauses between adjoin¬ 
ing runs of the transformation. But if all that 
is needed is a “snapshot” of an environment’s 
spectral status, BSP-blocking is a viable way to 
save CPU resources. 

Results of a spectral analysis written to a ta¬ 
ble can be analysed by several objects at the 
same time, forming feature vectors that can 
be post-processed for classification. The costly 
Fourier transformation has to be run only once 
for each vector. 



Figure 4: Calculating the arithmetic mean with 
BSP - main patch 






As an example for an FE-BSP object we will 
take a closer look at the calculation of the mean 
of a table. The arithmetic mean needs an ac¬ 
cumulation of all table values, which then is di¬ 
vided by the table size. 

While it would be possible to divide every sin¬ 
gle value by the table size before adding them, 
here the accumulation is calculated first, then 
a single division is made. This gives a minor 
speed-up, but here it should also illustrate the 
way, the execution order for BSP calculations 
can be controlled. 

With Pd’s normal message computations, 
[trigger] objects are used to specify a certain ex¬ 
ecution order. With signal objects the execution 
order can only be manipulated by laying out 
the operations into subpatches, that are con¬ 
nected by signal patchcords. 10 The connected 
subpatches are calculated by Pd’s main sched¬ 
uler in the order they are connected with objects 
earlier in the chain calculated first (contrasting 
Pd’s depth first scheduling for messages). 

In Figure 4 both subpatches are connected 
like this, so first every signal object in [pd ac- 
cum] is run, then every signal object in [pd get- 
mean]. Dummy signal inlets and outlets are in¬ 
cluded in both subpatches to allow making these 
order-forcing connections. 

The accumulation of all values in a table can 
be coded using only a handful of objects: Using 
a one-pole recursive filter as supplied by Pd in 
the [rpole~] object with a coefficient of 1 will 
accumulate all data as long as its switched on. 
The filter is reset to zero after each run using 
the [bang~] object that outputs a bang after 
each DSP cycle. 

The output of [rpole~] is written into a 
summing-table using [tabsend~]. The final po¬ 
sition in this table will hold the accumulation 
value. 

In the second subpatch (Fig.6) it’s a [bang~] 
object that will transport this value to the divi¬ 
sion by N before it is reported to the outlet after 
the block calculation has completed. Mean cal¬ 
culation has a latency of one block of samples 
or blocksize divided by samplerate. Upsampling 
inside of subpatches to reduce this latency can 
be achieved by changing the third argument of 
the switch-objects or by sending it respective 
messages. 

When implementing the FE objects, the lack 
of a Pd object to selectively write a value into 
a table at a certain position specified by an au- 

10 [Puckette, 2007, p. 212-216] 



dummy outlet to get correct excecution order 


Figure 5: Subpatch “accumulate Accumulat¬ 
ing table values with a one-pole filter 

dio signal was especially limiting. For example 
while finding local extrema (minima or maxima) 
in a table is easy by scanning through the table 
once and comparing two adjacent table values, 
the author hasn’t yet found a way to efficiently 
store the locations of these extrema for later use. 

Indexed table writing would also make an¬ 
other workaround or “hack” in the FE objects 
unnecessary: To calculate the geometric mean 
the product of all values in a table is needed, 
but there is no “vanilla”-way to reuse the result 
of the multiplication of previous samples again 
as it is with the one-pole filter that adds pre¬ 
vious results (output values) back to its input: 
the next sample in a block. The workaround 
currently applied is to transform the multipli¬ 
cations to additions using logarithms. 

5 Other Applications of BSP 

The BSP technique can be applied to various 
other areas, where parallel, block-based pro¬ 
cessing is needed. Examples would be Cellular 
Automata, L-Systems or swarm/flock systems. 





dummy inlet for execution order 



"blocksize - 1" to get final value 



Figure 6: Subpatch “get-meanCalculating 
the mean from the accumulated table and the 
table size. 

But even for much simpler daily tasks in the 
life of a Pd user, BSP is worth a look. For ex¬ 
ample to copy the content of one table to an¬ 
other one, a blocked combination of tabplay~ 
and tabwrite~ can be used. 

6 Limitations of BSP 

As many optimization techniques, BSP has sev¬ 
eral limitations that have to be weighted against 
the possible performance gains. One important 
problem is execution order. Pd alternates au¬ 
dio and message computations. BSP however 
lives in a grey area between the audio signal and 
control computations. New results will be com¬ 
puted in the signal pass, where no other control 
calculations happen. It is not possible to trigger 
other control-events in the middle of a BSP-run. 

The algorithms to be used in BSP cannot use 
recursion inside one block, because feedback of 
results computed at a later point in the DSP 
tree to an earlier point is not possible. The 
minimal feedback delay time in Pd is one block, 
other constructs result in “DSP loop detected”- 
errors in Pd. This limit is expected to compli¬ 
cate applying BSP in recursive algorithms like 
L-Systems. The inclusion of a suitable table¬ 
writing object in Pd vanilla as mentioned above 
could alleviate this problem a bit. 


BSP deals only with numbers, so it’s applica¬ 
bility to text processing is very limited, which 
affects the use of formal grammars. The sym¬ 
bols used in alphabets of L-Systems or similar 
systems based on rewrite rules have to be con¬ 
verted to numbers by implementing a transla¬ 
tion map. 

7 Conclusions and future work 

BSP has been successfully applied to a simple 
physical simulation for this paper and a growing 
set of feature extraction objects. The [physigs] 
object will be refined and published under an 
open source license, while the feature extraction 
objects will become a part of the “rj” library 
developed for the RjDj application. While BSP 
is a powerful and so far under-used technique in 
Pd, it cannot magically transform Pd to become 
a true multiple-rate software. Music compilers 
like ChucK, SND-RT or LuaAV still deal with 
the competing demands of variable control rates 
in a cleaner and more flexible way. 
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Figure 7: Help-file for the [physigs] abstraction 



